{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "250def3f",
   "metadata": {},
   "source": [
    "# ChEMBL webresource client examples\n",
    "\n",
    "The library helps accessing ChEMBL data and cheminformatics tools from Python. You don't need to know how to write SQL. You don't need to know how to interact with REST APIs. You don't need to compile or install any cheminformatics frameworks. Results are cached.\n",
    "\n",
    "The client handles interaction with the HTTPS protocol and caches all results in the local file system for faster retrieval. Abstracting away all network-related tasks, the client provides the end user with a convenient interface, giving the impression of working with a local resource. Design is based on the Django QuerySet interface. The client also implements lazy evaluation of results, which means it will only evaluate a request for data when a value is required. This approach reduces number of network requests and increases performance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3dde01c",
   "metadata": {},
   "source": [
    "## Available data entities\n",
    "\n",
    "You can list available data entities using the following code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "6f02c0c2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['activity', 'activity_supplementary_data_by_activity', 'assay', 'assay_class', 'atc_class', 'binding_site', 'biotherapeutic', 'cell_line', 'chembl_id_lookup', 'compound_record', 'compound_structural_alert', 'description', 'document', 'document_similarity', 'drug', 'drug_indication', 'drug_warning', 'go_slim', 'image', 'mechanism', 'metabolism', 'molecule', 'molecule_form', 'official', 'organism', 'protein_class', 'similarity', 'source', 'substructure', 'target', 'target_component', 'target_relation', 'tissue', 'xref_source']\n"
     ]
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "available_resources = [resource for resource in dir(new_client) if not resource.startswith('_')]\n",
    "print(available_resources)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4362a997",
   "metadata": {},
   "source": [
    "## Available filters\n",
    "\n",
    "The design of the client is based on Django QuerySet (https://docs.djangoproject.com/en/1.11/ref/models/querysets) and most important lookup types are supported. These are:\n",
    "\n",
    "- exact\n",
    "- iexact\n",
    "- contains\n",
    "- icontains\n",
    "- in\n",
    "- gt\n",
    "- gte\n",
    "- lt\n",
    "- lte\n",
    "- startswith\n",
    "- istartswith\n",
    "- endswith\n",
    "- iendswith\n",
    "- range\n",
    "- isnull\n",
    "- regex\n",
    "- iregex"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4404c0d9",
   "metadata": {},
   "source": [
    "## Only operator\n",
    "\n",
    "`only` is a special method allowing to limit the results to a selected set of fields. only should take a single argument: a list of fields that should be included in result. Specified fields have to exists in the endpoint against which only is executed. Using only will usually make an API call faster because less information returned will save bandwidth. The API logic will also check if any SQL joins are necessary to return the specified field and exclude unnecessary joins with critically improves performance.\n",
    "\n",
    "Please note that only has one limitation: a list of fields will ignore nested fields i.e. calling only(['molecule_properties__alogp']) is equivalent to only(['molecule_properties']).\n",
    "\n",
    "For many 2 many relationships only will not make any SQL join optimisation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7158f374",
   "metadata": {},
   "source": [
    "# Molecules\n",
    "\n",
    "Molecule records may be retrieved in a number of ways, such as lookup of single molecules using various identifiers or searching for compounds via similarity."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "563391e5",
   "metadata": {},
   "source": [
    "## Find a molecule by pref_name\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "76fbd5a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'atc_classifications': ['B01AC06', 'N02BA01', 'N02BA51', 'A01AD05', 'N02BA71'], 'availability_type': 2, 'biotherapeutic': None, 'black_box_warning': 0, 'chebi_par_id': 15365, 'chirality': 2, 'cross_references': [{'xref_id': 'aspirin', 'xref_name': 'aspirin', 'xref_src': 'DailyMed'}, {'xref_id': '144203627', 'xref_name': 'SID: 144203627', 'xref_src': 'PubChem'}, {'xref_id': '144209315', 'xref_name': 'SID: 144209315', 'xref_src': 'PubChem'}, {'xref_id': '144210466', 'xref_name': 'SID: 144210466', 'xref_src': 'PubChem'}, {'xref_id': '170465039', 'xref_name': 'SID: 170465039', 'xref_src': 'PubChem'}, {'xref_id': '17389202', 'xref_name': 'SID: 17389202', 'xref_src': 'PubChem'}, {'xref_id': '17390036', 'xref_name': 'SID: 17390036', 'xref_src': 'PubChem'}, {'xref_id': '174007205', 'xref_name': 'SID: 174007205', 'xref_src': 'PubChem'}, {'xref_id': '26747283', 'xref_name': 'SID: 26747283', 'xref_src': 'PubChem'}, {'xref_id': '26752858', 'xref_name': 'SID: 26752858', 'xref_src': 'PubChem'}, {'xref_id': '47193676', 'xref_name': 'SID: 47193676', 'xref_src': 'PubChem'}, {'xref_id': '50105490', 'xref_name': 'SID: 50105490', 'xref_src': 'PubChem'}, {'xref_id': '85230910', 'xref_name': 'SID: 85230910', 'xref_src': 'PubChem'}, {'xref_id': '87798', 'xref_name': 'SID: 87798', 'xref_src': 'PubChem'}, {'xref_id': '90340586', 'xref_name': 'SID: 90340586', 'xref_src': 'PubChem'}, {'xref_id': '14', 'xref_name': 'aspirin', 'xref_src': 'TG-GATEs'}, {'xref_id': 'Aspirin', 'xref_name': None, 'xref_src': 'Wikipedia'}], 'dosed_ingredient': True, 'first_approval': 1950, 'first_in_class': 0, 'helm_notation': None, 'indication_class': 'Analgesic; Antirheumatic; Antipyretic', 'inorganic_flag': 0, 'max_phase': 4, 'molecule_chembl_id': 'CHEMBL25', 'molecule_hierarchy': {'molecule_chembl_id': 'CHEMBL25', 'parent_chembl_id': 'CHEMBL25'}, 'molecule_properties': {'alogp': '1.31', 'aromatic_rings': 1, 'cx_logd': '-2.16', 'cx_logp': '1.24', 'cx_most_apka': '3.41', 'cx_most_bpka': None, 'full_molformula': 'C9H8O4', 'full_mwt': '180.16', 'hba': 3, 'hba_lipinski': 4, 'hbd': 1, 'hbd_lipinski': 1, 'heavy_atoms': 13, 'molecular_species': 'ACID', 'mw_freebase': '180.16', 'mw_monoisotopic': '180.0423', 'num_lipinski_ro5_violations': 0, 'num_ro5_violations': 0, 'psa': '63.60', 'qed_weighted': '0.55', 'ro3_pass': 'N', 'rtb': 2}, 'molecule_structures': {'canonical_smiles': 'CC(=O)Oc1ccccc1C(=O)O', 'molfile': '\\n     RDKit          2D\\n\\n 13 13  0  0  0  0  0  0  0  0999 V2000\\n    8.8810   -2.1206    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    8.8798   -2.9479    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    9.5946   -3.3607    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   10.3110   -2.9474    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   10.3081   -2.1170    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    9.5928   -1.7078    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.0210   -1.7018    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.7369   -2.1116    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.0260   -3.3588    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.0273   -4.1837    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.7423   -4.5949    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   10.3136   -4.5972    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.0178   -0.8769    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n  1  2  2  0\\n  5  7  1  0\\n  3  4  2  0\\n  7  8  2  0\\n  4  9  1  0\\n  4  5  1  0\\n  9 10  1  0\\n  2  3  1  0\\n 10 11  1  0\\n  5  6  2  0\\n 10 12  2  0\\n  6  1  1  0\\n  7 13  1  0\\nM  END\\n\\n> <chembl_id>\\nCHEMBL25\\n\\n> <chembl_pref_name>\\nASPIRIN\\n\\n', 'standard_inchi': 'InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)', 'standard_inchi_key': 'BSYNRYMUTXBXSQ-UHFFFAOYSA-N'}, 'molecule_synonyms': [{'molecule_synonym': '8-hour bayer', 'syn_type': 'TRADE_NAME', 'synonyms': '8-HOUR BAYER'}, {'molecule_synonym': 'Acetosalic Acid', 'syn_type': 'TRADE_NAME', 'synonyms': 'Acetosalic Acid'}, {'molecule_synonym': 'Acetylsalic acid', 'syn_type': 'TRADE_NAME', 'synonyms': 'ACETYLSALIC ACID'}, {'molecule_synonym': 'Acetylsalicylic Acid', 'syn_type': 'INN', 'synonyms': 'Acetylsalicylic Acid'}, {'molecule_synonym': 'Acetylsalicylic Acid', 'syn_type': 'TRADE_NAME', 'synonyms': 'Acetylsalicylic Acid'}, {'molecule_synonym': 'Acetylsalicylic acid', 'syn_type': 'ATC', 'synonyms': 'ACETYLSALICYLIC ACID'}, {'molecule_synonym': 'Acetylsalicylic acid', 'syn_type': 'OTHER', 'synonyms': 'ACETYLSALICYLIC ACID'}, {'molecule_synonym': 'Alka rapid', 'syn_type': 'TRADE_NAME', 'synonyms': 'ALKA RAPID'}, {'molecule_synonym': 'Anadin all night', 'syn_type': 'TRADE_NAME', 'synonyms': 'ANADIN ALL NIGHT'}, {'molecule_synonym': 'Angettes 75', 'syn_type': 'TRADE_NAME', 'synonyms': 'ANGETTES 75'}, {'molecule_synonym': 'Aspirin', 'syn_type': 'USAN', 'synonyms': 'Aspirin'}, {'molecule_synonym': 'Aspirin', 'syn_type': 'BAN', 'synonyms': 'ASPIRIN'}, {'molecule_synonym': 'Aspirin', 'syn_type': 'BNF', 'synonyms': 'ASPIRIN'}, {'molecule_synonym': 'Aspirin', 'syn_type': 'FDA', 'synonyms': 'ASPIRIN'}, {'molecule_synonym': 'Aspirin', 'syn_type': 'JAN', 'synonyms': 'ASPIRIN'}, {'molecule_synonym': 'Aspirin', 'syn_type': 'MERCK_INDEX', 'synonyms': 'ASPIRIN'}, {'molecule_synonym': 'Aspirin', 'syn_type': 'OTHER', 'synonyms': 'ASPIRIN'}, {'molecule_synonym': 'Aspirin', 'syn_type': 'TRADE_NAME', 'synonyms': 'ASPIRIN'}, {'molecule_synonym': 'Aspirin', 'syn_type': 'USP', 'synonyms': 'ASPIRIN'}, {'molecule_synonym': 'Aspro clr', 'syn_type': 'TRADE_NAME', 'synonyms': 'ASPRO CLR'}, {'molecule_synonym': 'BAY1019036', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'BAY1019036'}, {'molecule_synonym': 'Bayer extra strength aspirin for migraine pain', 'syn_type': 'TRADE_NAME', 'synonyms': 'BAYER EXTRA STRENGTH ASPIRIN FOR MIGRAINE PAIN'}, {'molecule_synonym': 'Danamep', 'syn_type': 'TRADE_NAME', 'synonyms': 'DANAMEP'}, {'molecule_synonym': 'Disprin cv', 'syn_type': 'TRADE_NAME', 'synonyms': 'DISPRIN CV'}, {'molecule_synonym': 'Disprin direct', 'syn_type': 'TRADE_NAME', 'synonyms': 'DISPRIN DIRECT'}, {'molecule_synonym': 'Durlaza', 'syn_type': 'TRADE_NAME', 'synonyms': 'DURLAZA'}, {'molecule_synonym': 'Ecotrin', 'syn_type': 'TRADE_NAME', 'synonyms': 'Ecotrin'}, {'molecule_synonym': 'Enprin', 'syn_type': 'TRADE_NAME', 'synonyms': 'ENPRIN'}, {'molecule_synonym': 'Equi-Prin', 'syn_type': 'TRADE_NAME', 'synonyms': 'Equi-Prin'}, {'molecule_synonym': 'Gencardia', 'syn_type': 'TRADE_NAME', 'synonyms': 'GENCARDIA'}, {'molecule_synonym': 'Levius', 'syn_type': 'TRADE_NAME', 'synonyms': 'LEVIUS'}, {'molecule_synonym': 'Max strgh aspro clr', 'syn_type': 'TRADE_NAME', 'synonyms': 'MAX STRGH ASPRO CLR'}, {'molecule_synonym': 'Measurin', 'syn_type': 'TRADE_NAME', 'synonyms': 'MEASURIN'}, {'molecule_synonym': 'Micropirin ec', 'syn_type': 'TRADE_NAME', 'synonyms': 'MICROPIRIN EC'}, {'molecule_synonym': 'NSC-27223', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'NSC-27223'}, {'molecule_synonym': 'NSC-406186', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'NSC-406186'}, {'molecule_synonym': 'Nu-seals 300', 'syn_type': 'TRADE_NAME', 'synonyms': 'NU-SEALS 300'}, {'molecule_synonym': 'Nu-seals 600', 'syn_type': 'TRADE_NAME', 'synonyms': 'NU-SEALS 600'}, {'molecule_synonym': 'Nu-seals 75', 'syn_type': 'TRADE_NAME', 'synonyms': 'NU-SEALS 75'}, {'molecule_synonym': 'Nu-seals cardio 75', 'syn_type': 'TRADE_NAME', 'synonyms': 'NU-SEALS CARDIO 75'}, {'molecule_synonym': 'Paynocil', 'syn_type': 'TRADE_NAME', 'synonyms': 'PAYNOCIL'}, {'molecule_synonym': 'Platet', 'syn_type': 'TRADE_NAME', 'synonyms': 'PLATET'}, {'molecule_synonym': 'Platet 300', 'syn_type': 'TRADE_NAME', 'synonyms': 'PLATET 300'}, {'molecule_synonym': 'Postmi 300', 'syn_type': 'TRADE_NAME', 'synonyms': 'POSTMI 300'}, {'molecule_synonym': 'Postmi 75', 'syn_type': 'TRADE_NAME', 'synonyms': 'POSTMI 75'}, {'molecule_synonym': 'Salicylic Acid Acetate', 'syn_type': 'TRADE_NAME', 'synonyms': 'Salicylic Acid Acetate'}, {'molecule_synonym': 'Vazalore', 'syn_type': 'TRADE_NAME', 'synonyms': 'VAZALORE'}], 'molecule_type': 'Small molecule', 'natural_product': 0, 'oral': True, 'parenteral': False, 'polymer_flag': False, 'pref_name': 'ASPIRIN', 'prodrug': 0, 'structure_type': 'MOL', 'therapeutic_flag': True, 'topical': False, 'usan_stem': None, 'usan_stem_definition': None, 'usan_substem': None, 'usan_year': None, 'withdrawn_class': None, 'withdrawn_country': None, 'withdrawn_flag': False, 'withdrawn_reason': None, 'withdrawn_year': None}]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "mols = molecule.filter(pref_name__iexact='aspirin')\n",
    "mols"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55236717",
   "metadata": {},
   "source": [
    "## Find a molecule by its synonyms\n",
    "\n",
    "- in case it is not found by pref_name\n",
    "- Use the `only` method where you can specify fields you want to be included in response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "de2034b7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'molecule_chembl_id': 'CHEMBL192'}, {'molecule_chembl_id': 'CHEMBL1737'}]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "mols = molecule.filter(molecule_synonyms__molecule_synonym__iexact='viagra').only('molecule_chembl_id')\n",
    "mols"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2698ba2d",
   "metadata": {},
   "source": [
    "## Get a single molecule by ChEMBL id\n",
    "\n",
    "All the main entities in the ChEMBL database have a ChEMBL ID. It is a stable identifier designed for straightforward lookup of data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "f71fa26f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'molecule_chembl_id': 'CHEMBL192', 'molecule_structures': {'canonical_smiles': 'CCCc1nn(C)c2c(=O)[nH]c(-c3cc(S(=O)(=O)N4CCN(C)CC4)ccc3OCC)nc12', 'molfile': '\\n     RDKit          2D\\n\\n 33 36  0  0  0  0  0  0  0  0999 V2000\\n    2.1000   -0.0042    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.1000    0.7000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.5375   -0.0042    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4917   -0.3667    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.8792   -0.0042    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.8042    0.9083    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4917    1.0625    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.8792    0.6833    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.2042    0.3458    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.8042   -0.2417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.2875   -0.3750    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.1583   -0.3750    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.9333   -0.3750    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.3208   -0.0333    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.1875    0.6083    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.8958    0.6083    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -3.3958   -1.0917    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.7833   -0.0042    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.1583   -1.0917    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.2875   -1.1125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4917    1.7708    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.9333   -1.1125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.3208   -1.4542    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -3.3958   -0.3750    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.7833   -1.4417    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.0750    1.5750    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.8042   -0.9500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.8792   -1.4542    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -3.9958   -1.4292    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4958   -1.1000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.4167   -1.3125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.1125   -1.4500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    4.0375   -0.9542    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n  2  1  2  0\\n  3 13  1  0\\n  4  1  1  0\\n  5  4  2  0\\n  6  2  1  0\\n  7  2  1  0\\n  8  5  1  0\\n  9 10  2  0\\n 10  1  1  0\\n 11  5  1  0\\n 12  3  1  0\\n 13 14  2  0\\n 14 11  1  0\\n 15  3  2  0\\n 16  3  2  0\\n 17 25  1  0\\n 18 12  1  0\\n 19 12  1  0\\n 20 11  2  0\\n 21  7  2  0\\n 22 23  2  0\\n 23 20  1  0\\n 24 18  1  0\\n 25 19  1  0\\n 26  6  1  0\\n 27 10  1  0\\n 28 20  1  0\\n 29 17  1  0\\n 30 28  1  0\\n 31 27  1  0\\n 32 30  1  0\\n 33 31  1  0\\n  9  6  1  0\\n  8  7  1  0\\n 22 13  1  0\\n 17 24  1  0\\nM  END\\n\\n> <chembl_id>\\nCHEMBL192\\n\\n> <chembl_pref_name>\\nSILDENAFIL\\n\\n', 'standard_inchi': 'InChI=1S/C22H30N6O4S/c1-5-7-17-19-20(27(4)25-17)22(29)24-21(23-19)16-14-15(8-9-18(16)32-6-2)33(30,31)28-12-10-26(3)11-13-28/h8-9,14H,5-7,10-13H2,1-4H3,(H,23,24,29)', 'standard_inchi_key': 'BNRNXUUZRGQAQC-UHFFFAOYSA-N'}, 'pref_name': 'SILDENAFIL'}]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "m1 = molecule.filter(chembl_id='CHEMBL192').only(['molecule_chembl_id', 'pref_name', 'molecule_structures'])\n",
    "m1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b5f7079",
   "metadata": {},
   "source": [
    "## Get many molecules by id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "48183d1b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'molecule_chembl_id': 'CHEMBL25', 'pref_name': 'ASPIRIN'}, {'molecule_chembl_id': 'CHEMBL27', 'pref_name': 'PROPRANOLOL'}, {'molecule_chembl_id': 'CHEMBL192', 'pref_name': 'SILDENAFIL'}]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "mols = molecule.filter(molecule_chembl_id__in=['CHEMBL25', 'CHEMBL192', 'CHEMBL27']).only(['molecule_chembl_id', 'pref_name'])\n",
    "mols"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25502759",
   "metadata": {},
   "source": [
    "## Display a molecule image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "8f2ccf9d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:rdkit=\"http://www.rdkit.org/xml\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" version=\"1.1\" baseProfile=\"full\" xml:space=\"preserve\" width=\"500px\" height=\"500px\" viewBox=\"0 0 500 500\">\n",
       "<!-- END OF HEADER -->\n",
       "<path class=\"bond-0\" d=\"M 61.3426,174.682 L 61.196,275.762\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-0\" d=\"M 83.3129,189.876 L 83.2103,260.632\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-11\" d=\"M 61.3426,174.682 L 148.31,124.246\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-7\" d=\"M 61.196,275.762 L 148.53,326.197\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2\" d=\"M 148.53,326.197 L 236.06,275.701\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2\" d=\"M 150.67,299.573 L 211.94,264.225\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4\" d=\"M 236.06,275.701 L 276.237,298.818\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4\" d=\"M 276.237,298.818 L 316.413,321.935\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5\" d=\"M 236.06,275.701 L 235.705,174.242\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-1\" d=\"M 235.705,174.242 L 322.807,123.513\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9\" d=\"M 235.705,174.242 L 148.31,124.246\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9\" d=\"M 211.676,185.832 L 150.499,150.835\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3\" d=\"M 317.345,133.057 L 357.576,156.086\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3\" d=\"M 357.576,156.086 L 397.808,179.116\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3\" d=\"M 328.27,113.97 L 368.502,137\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3\" d=\"M 368.502,137 L 408.734,160.03\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12\" d=\"M 322.807,123.513 L 322.626,76.8703\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12\" d=\"M 322.626,76.8703 L 322.445,30.2273\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-6\" d=\"M 323.43,333.465 L 323.504,380.108\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-6\" d=\"M 323.504,380.108 L 323.577,426.751\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8\" d=\"M 323.577,426.751 L 410.936,476.992\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10\" d=\"M 318.065,417.237 L 277.967,440.468\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10\" d=\"M 277.967,440.468 L 237.87,463.7\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10\" d=\"M 329.09,436.266 L 288.992,459.497\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10\" d=\"M 288.992,459.497 L 248.895,482.729\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<text x=\"403.271\" y=\"181.083\" style=\"font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#FF0000\"><tspan>O</tspan></text>\n",
       "<text x=\"316.413\" y=\"333.465\" style=\"font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#FF0000\"><tspan>O</tspan></text>\n",
       "<text x=\"229.373\" y=\"484.773\" style=\"font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#FF0000\"><tspan>O</tspan></text>\n",
       "<text x=\"308.911\" y=\"30.2273\" style=\"font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#FF0000\"><tspan>OH</tspan></text>\n",
       "</svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "from IPython.display import SVG\n",
    "\n",
    "image = new_client.image\n",
    "image.set_format('svg')\n",
    "SVG(image.get('CHEMBL25'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b9d185b",
   "metadata": {},
   "source": [
    "## Get a single molecule by standard inchi key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "00e75dfb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'molecule_chembl_id': 'CHEMBL25', 'molecule_structures': {'canonical_smiles': 'CC(=O)Oc1ccccc1C(=O)O', 'molfile': '\\n     RDKit          2D\\n\\n 13 13  0  0  0  0  0  0  0  0999 V2000\\n    8.8810   -2.1206    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    8.8798   -2.9479    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    9.5946   -3.3607    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   10.3110   -2.9474    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   10.3081   -2.1170    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    9.5928   -1.7078    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.0210   -1.7018    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.7369   -2.1116    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.0260   -3.3588    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.0273   -4.1837    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.7423   -4.5949    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   10.3136   -4.5972    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   11.0178   -0.8769    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n  1  2  2  0\\n  5  7  1  0\\n  3  4  2  0\\n  7  8  2  0\\n  4  9  1  0\\n  4  5  1  0\\n  9 10  1  0\\n  2  3  1  0\\n 10 11  1  0\\n  5  6  2  0\\n 10 12  2  0\\n  6  1  1  0\\n  7 13  1  0\\nM  END\\n\\n> <chembl_id>\\nCHEMBL25\\n\\n> <chembl_pref_name>\\nASPIRIN\\n\\n', 'standard_inchi': 'InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)', 'standard_inchi_key': 'BSYNRYMUTXBXSQ-UHFFFAOYSA-N'}, 'pref_name': 'ASPIRIN'}]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "mol = molecule.filter(molecule_structures__standard_inchi_key='BSYNRYMUTXBXSQ-UHFFFAOYSA-N').only(['molecule_chembl_id', 'pref_name', 'molecule_structures'])\n",
    "mol"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbb85bd6",
   "metadata": {},
   "source": [
    "## Find compounds similar to given SMILES query with similarity threshold of 70%"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ed8db8a2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'molecule_chembl_id': 'CHEMBL477888', 'similarity': '85.4166686534881591796875'}\n",
      "{'molecule_chembl_id': 'CHEMBL477889', 'similarity': '85.4166686534881591796875'}\n",
      "{'molecule_chembl_id': 'CHEMBL478779', 'similarity': '85.4166686534881591796875'}\n",
      "{'molecule_chembl_id': 'CHEMBL2304268', 'similarity': '70.1754391193389892578125'}\n"
     ]
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "similarity = new_client.similarity\n",
    "res = similarity.filter(smiles=\"CO[C@@H](CCC#C\\C=C/CCCC(C)CCCCC=C)C(=O)[O-]\", similarity=70).only(['molecule_chembl_id', 'similarity'])\n",
    "for i in res:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "547e0986",
   "metadata": {},
   "source": [
    "## Find compounds similar to aspirin (CHEMBL25) with similarity threshold of 70%\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "a216cbc2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'molecule_chembl_id': 'CHEMBL2296002', 'pref_name': None, 'similarity': '100'}, {'molecule_chembl_id': 'CHEMBL1697753', 'pref_name': 'ASPIRIN DL-LYSINE', 'similarity': '100'}, {'molecule_chembl_id': 'CHEMBL3833325', 'pref_name': 'CARBASPIRIN CALCIUM', 'similarity': '88.8888895511627197265625'}, {'molecule_chembl_id': 'CHEMBL3833404', 'pref_name': 'CARBASPIRIN', 'similarity': '88.8888895511627197265625'}, '...(remaining elements truncated)...']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "similarity = new_client.similarity\n",
    "res = similarity.filter(chembl_id='CHEMBL25', similarity=70).only(['molecule_chembl_id', 'pref_name', 'similarity'])\n",
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79c08ade",
   "metadata": {},
   "source": [
    "## Find compounds with the same connectivity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "8df8d583",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'molecule_chembl_id': 'CHEMBL1431', 'pref_name': 'METFORMIN'}\n",
      "{'molecule_chembl_id': 'CHEMBL1703', 'pref_name': 'METFORMIN HYDROCHLORIDE'}\n",
      "{'molecule_chembl_id': 'CHEMBL3094198', 'pref_name': None}\n"
     ]
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "res = molecule.filter(molecule_structures__canonical_smiles__connectivity='CN(C)C(=N)N=C(N)N').only(['molecule_chembl_id', 'pref_name'])\n",
    "for i in res:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12ab1f77",
   "metadata": {},
   "source": [
    "## Get all approved drugs\n",
    "\n",
    "using `order_by` to sort them by molecular weight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "647ed1ec",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'atc_classifications': ['V03AN03'], 'availability_type': 1, 'biotherapeutic': None, 'black_box_warning': 0, 'chebi_par_id': 30217, 'chirality': 2, 'cross_references': [], 'dosed_ingredient': True, 'first_approval': 2015, 'first_in_class': 0, 'helm_notation': None, 'indication_class': 'Gases, Diluent for', 'inorganic_flag': 1, 'max_phase': 4, 'molecule_chembl_id': 'CHEMBL1796997', 'molecule_hierarchy': {'molecule_chembl_id': 'CHEMBL1796997', 'parent_chembl_id': 'CHEMBL1796997'}, 'molecule_properties': {'alogp': None, 'aromatic_rings': None, 'cx_logd': None, 'cx_logp': None, 'cx_most_apka': None, 'cx_most_bpka': None, 'full_molformula': 'He', 'full_mwt': '4.00', 'hba': None, 'hba_lipinski': None, 'hbd': None, 'hbd_lipinski': None, 'heavy_atoms': None, 'molecular_species': None, 'mw_freebase': '4.00', 'mw_monoisotopic': '4.0026', 'num_lipinski_ro5_violations': None, 'num_ro5_violations': None, 'psa': None, 'qed_weighted': None, 'ro3_pass': None, 'rtb': None}, 'molecule_structures': {'canonical_smiles': '[He]', 'molfile': '\\n     RDKit          2D\\n\\n  1  0  0  0  0  0  0  0  0  0999 V2000\\n   -3.1917    1.0958    0.0000 He  0  0  0  0  0 15  0  0  0  0  0  0\\nM  END\\n\\n> <chembl_id>\\nCHEMBL1796997\\n\\n> <chembl_pref_name>\\nHELIUM\\n\\n', 'standard_inchi': 'InChI=1S/He', 'standard_inchi_key': 'SWQJXJOGLNCZEY-UHFFFAOYSA-N'}, 'molecule_synonyms': [{'molecule_synonym': 'E939', 'syn_type': 'E_NUMBER', 'synonyms': 'E939'}, {'molecule_synonym': 'E-939', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'E-939'}, {'molecule_synonym': 'Helium', 'syn_type': 'ATC', 'synonyms': 'HELIUM'}, {'molecule_synonym': 'Helium', 'syn_type': 'FDA', 'synonyms': 'HELIUM'}, {'molecule_synonym': 'Helium', 'syn_type': 'MERCK_INDEX', 'synonyms': 'HELIUM'}, {'molecule_synonym': 'Helium', 'syn_type': 'OTHER', 'synonyms': 'HELIUM'}, {'molecule_synonym': 'Helium', 'syn_type': 'USP', 'synonyms': 'HELIUM'}, {'molecule_synonym': 'INS-939', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'INS-939'}, {'molecule_synonym': 'INS NO.939', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'INS NO.939'}], 'molecule_type': 'Small molecule', 'natural_product': 0, 'oral': False, 'parenteral': False, 'polymer_flag': False, 'pref_name': 'HELIUM', 'prodrug': 0, 'structure_type': 'MOL', 'therapeutic_flag': False, 'topical': True, 'usan_stem': '-ium', 'usan_stem_definition': 'quaternary ammonium derivatives', 'usan_substem': '-ium', 'usan_year': None, 'withdrawn_class': None, 'withdrawn_country': None, 'withdrawn_flag': False, 'withdrawn_reason': None, 'withdrawn_year': None}, {'atc_classifications': [], 'availability_type': 1, 'biotherapeutic': None, 'black_box_warning': 0, 'chebi_par_id': 16134, 'chirality': 2, 'cross_references': [{'xref_id': 'ammonia%20n-13', 'xref_name': 'ammonia n-13', 'xref_src': 'DailyMed'}, {'xref_id': 'Ammonia', 'xref_name': None, 'xref_src': 'Wikipedia'}], 'dosed_ingredient': False, 'first_approval': 2007, 'first_in_class': 0, 'helm_notation': None, 'indication_class': 'Pharmaceutic Aid (solvent and source of ammonia),Radioactive Agent; Diagnostic Aid (cardiac imaging); Diagnostic Aid (liver imaging)', 'inorganic_flag': 0, 'max_phase': 4, 'molecule_chembl_id': 'CHEMBL1160819', 'molecule_hierarchy': {'molecule_chembl_id': 'CHEMBL1160819', 'parent_chembl_id': 'CHEMBL1160819'}, 'molecule_properties': {'alogp': '0.16', 'aromatic_rings': 0, 'cx_logd': '-2.44', 'cx_logp': '-0.98', 'cx_most_apka': None, 'cx_most_bpka': '8.86', 'full_molformula': 'H3N', 'full_mwt': '17.03', 'hba': 1, 'hba_lipinski': 1, 'hbd': 1, 'hbd_lipinski': 3, 'heavy_atoms': 1, 'molecular_species': 'BASE', 'mw_freebase': '17.03', 'mw_monoisotopic': '17.0265', 'num_lipinski_ro5_violations': 0, 'num_ro5_violations': 0, 'psa': '35.00', 'qed_weighted': '0.40', 'ro3_pass': 'Y', 'rtb': 0}, 'molecule_structures': {'canonical_smiles': 'N', 'molfile': '\\n     RDKit          2D\\n\\n  1  0  0  0  0  0  0  0  0  0999 V2000\\n    1.3000    0.7500    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\nM  END\\n\\n> <chembl_id>\\nCHEMBL1160819\\n\\n> <chembl_pref_name>\\nAMMONIA SOLUTION, STRONG\\n\\n', 'standard_inchi': 'InChI=1S/H3N/h1H3', 'standard_inchi_key': 'QGZKDVFQNNGYKY-UHFFFAOYSA-N'}, 'molecule_synonyms': [{'molecule_synonym': 'Ammonia', 'syn_type': 'INN', 'synonyms': 'Ammonia'}, {'molecule_synonym': 'Ammonia', 'syn_type': 'MERCK_INDEX', 'synonyms': 'AMMONIA'}, {'molecule_synonym': 'Ammonia', 'syn_type': 'OTHER', 'synonyms': 'AMMONIA'}, {'molecule_synonym': 'Ammonia solution, strong', 'syn_type': 'NATIONAL_FORMULARY', 'synonyms': 'AMMONIA SOLUTION, STRONG'}, {'molecule_synonym': 'Ammonia solution, strong', 'syn_type': 'OTHER', 'synonyms': 'AMMONIA SOLUTION, STRONG'}, {'molecule_synonym': 'Ammonia spirit, aromatic', 'syn_type': 'OTHER', 'synonyms': 'AMMONIA SPIRIT, AROMATIC'}, {'molecule_synonym': 'Ammonia water', 'syn_type': 'JAN', 'synonyms': 'AMMONIA WATER'}, {'molecule_synonym': 'Ammonia water', 'syn_type': 'MERCK_INDEX', 'synonyms': 'AMMONIA WATER'}, {'molecule_synonym': 'Ammonia water', 'syn_type': 'OTHER', 'synonyms': 'AMMONIA WATER'}, {'molecule_synonym': 'E-527', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'E-527'}, {'molecule_synonym': 'INS-527', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'INS-527'}, {'molecule_synonym': 'INS NO.527', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'INS NO.527'}], 'molecule_type': 'Small molecule', 'natural_product': 0, 'oral': False, 'parenteral': True, 'polymer_flag': False, 'pref_name': 'AMMONIA SOLUTION, STRONG', 'prodrug': 0, 'structure_type': 'MOL', 'therapeutic_flag': False, 'topical': False, 'usan_stem': None, 'usan_stem_definition': None, 'usan_substem': None, 'usan_year': 1990, 'withdrawn_class': None, 'withdrawn_country': None, 'withdrawn_flag': False, 'withdrawn_reason': None, 'withdrawn_year': None}, {'atc_classifications': [], 'availability_type': 1, 'biotherapeutic': None, 'black_box_warning': 0, 'chebi_par_id': None, 'chirality': 2, 'cross_references': [{'xref_id': 'ammonia%20n-13', 'xref_name': 'ammonia n-13', 'xref_src': 'DailyMed'}], 'dosed_ingredient': True, 'first_approval': 2007, 'first_in_class': 0, 'helm_notation': None, 'indication_class': 'Radioactive Agent; Diagnostic Aid (cardiac imaging); Diagnostic Aid (liver imaging)', 'inorganic_flag': 0, 'max_phase': 4, 'molecule_chembl_id': 'CHEMBL1201189', 'molecule_hierarchy': {'molecule_chembl_id': 'CHEMBL1201189', 'parent_chembl_id': 'CHEMBL1160819'}, 'molecule_properties': {'alogp': '0.16', 'aromatic_rings': 0, 'cx_logd': '-2.44', 'cx_logp': '-0.98', 'cx_most_apka': None, 'cx_most_bpka': '8.86', 'full_molformula': 'H3N', 'full_mwt': '16.03', 'hba': 1, 'hba_lipinski': 1, 'hbd': 1, 'hbd_lipinski': 3, 'heavy_atoms': 1, 'molecular_species': 'BASE', 'mw_freebase': '17.03', 'mw_monoisotopic': '17.0265', 'num_lipinski_ro5_violations': 0, 'num_ro5_violations': 0, 'psa': '35.00', 'qed_weighted': '0.40', 'ro3_pass': 'Y', 'rtb': 0}, 'molecule_structures': {'canonical_smiles': '[13NH3]', 'molfile': '\\n     RDKit          2D\\n\\n  1  0  0  0  0  0  0  0  0  0999 V2000\\n   -0.1333   -0.4083    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\nM  ISO  1   1  13\\nM  END\\n\\n> <chembl_id>\\nCHEMBL1201189\\n\\n> <chembl_pref_name>\\nAMMONIA N 13\\n\\n', 'standard_inchi': 'InChI=1S/H3N/h1H3/i1-1', 'standard_inchi_key': 'QGZKDVFQNNGYKY-BJUDXGSMSA-N'}, 'molecule_synonyms': [{'molecule_synonym': '13n', 'syn_type': 'OTHER', 'synonyms': '13N'}, {'molecule_synonym': 'Ammonia (13N)', 'syn_type': 'OTHER', 'synonyms': 'Ammonia, n-13'}, {'molecule_synonym': 'Ammonia n 13', 'syn_type': 'OTHER', 'synonyms': 'AMMONIA N 13'}, {'molecule_synonym': 'Ammonia n 13', 'syn_type': 'TRADE_NAME', 'synonyms': 'AMMONIA N 13'}, {'molecule_synonym': 'Ammonia n 13', 'syn_type': 'USAN', 'synonyms': 'AMMONIA N 13'}, {'molecule_synonym': 'Ammonia n 13', 'syn_type': 'USP', 'synonyms': 'AMMONIA N 13'}, {'molecule_synonym': 'Ammonia n-13', 'syn_type': 'FDA', 'synonyms': 'AMMONIA N-13'}], 'molecule_type': 'Small molecule', 'natural_product': 0, 'oral': False, 'parenteral': True, 'polymer_flag': False, 'pref_name': 'AMMONIA N 13', 'prodrug': 0, 'structure_type': 'MOL', 'therapeutic_flag': False, 'topical': False, 'usan_stem': None, 'usan_stem_definition': None, 'usan_substem': None, 'usan_year': 1990, 'withdrawn_class': None, 'withdrawn_country': None, 'withdrawn_flag': False, 'withdrawn_reason': None, 'withdrawn_year': None}, {'atc_classifications': [], 'availability_type': 2, 'biotherapeutic': None, 'black_box_warning': 0, 'chebi_par_id': 15377, 'chirality': 2, 'cross_references': [{'xref_id': 'purified%20water', 'xref_name': 'purified water', 'xref_src': 'DailyMed'}, {'xref_id': 'sterile%20water%20for%20irrigation', 'xref_name': 'sterile water for irrigation', 'xref_src': 'DailyMed'}, {'xref_id': 'Properties_of_water', 'xref_name': None, 'xref_src': 'Wikipedia'}], 'dosed_ingredient': True, 'first_approval': 2011, 'first_in_class': 0, 'helm_notation': None, 'indication_class': 'Diagnostic Aid (radioactive, vascular disorders); Radioactive Agent,Radioactive Agent', 'inorganic_flag': 0, 'max_phase': 4, 'molecule_chembl_id': 'CHEMBL1098659', 'molecule_hierarchy': {'molecule_chembl_id': 'CHEMBL1098659', 'parent_chembl_id': 'CHEMBL1098659'}, 'molecule_properties': {'alogp': '-0.82', 'aromatic_rings': 0, 'cx_logd': '-0.65', 'cx_logp': '-0.65', 'cx_most_apka': None, 'cx_most_bpka': None, 'full_molformula': 'H2O', 'full_mwt': '18.02', 'hba': 0, 'hba_lipinski': 1, 'hbd': 0, 'hbd_lipinski': 2, 'heavy_atoms': 1, 'molecular_species': 'NEUTRAL', 'mw_freebase': '18.02', 'mw_monoisotopic': '18.0106', 'num_lipinski_ro5_violations': 0, 'num_ro5_violations': 0, 'psa': '31.50', 'qed_weighted': '0.33', 'ro3_pass': 'Y', 'rtb': 0}, 'molecule_structures': {'canonical_smiles': 'O', 'molfile': '\\n     RDKit          2D\\n\\n  1  0  0  0  0  0  0  0  0  0999 V2000\\n    0.4917   -8.4375    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\nM  END\\n\\n> <chembl_id>\\nCHEMBL1098659\\n\\n> <chembl_pref_name>\\nWATER\\n\\n', 'standard_inchi': 'InChI=1S/H2O/h1H2', 'standard_inchi_key': 'XLYOFNOQVPJJNP-UHFFFAOYSA-N'}, 'molecule_synonyms': [{'molecule_synonym': 'B1217', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'B1217'}, {'molecule_synonym': 'NSC-147337', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'NSC-147337'}, {'molecule_synonym': 'Purified water', 'syn_type': 'FDA', 'synonyms': 'PURIFIED WATER'}, {'molecule_synonym': 'Purified water', 'syn_type': 'USP', 'synonyms': 'PURIFIED WATER'}, {'molecule_synonym': 'Pur-wash', 'syn_type': 'TRADE_NAME', 'synonyms': 'PUR-WASH'}, {'molecule_synonym': 'R-718', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'R-718'}, {'molecule_synonym': 'Sterile purified water', 'syn_type': 'USP', 'synonyms': 'STERILE PURIFIED WATER'}, {'molecule_synonym': 'Sterile water', 'syn_type': 'TRADE_NAME', 'synonyms': 'STERILE WATER'}, {'molecule_synonym': 'Sterile water for injection', 'syn_type': 'FDA', 'synonyms': 'STERILE WATER FOR INJECTION'}, {'molecule_synonym': 'Sterile water for injection', 'syn_type': 'TRADE_NAME', 'synonyms': 'STERILE WATER FOR INJECTION'}, {'molecule_synonym': 'Sterile water for irrigation', 'syn_type': 'FDA', 'synonyms': 'STERILE WATER FOR IRRIGATION'}, {'molecule_synonym': 'Water', 'syn_type': 'MERCK_INDEX', 'synonyms': 'WATER'}, {'molecule_synonym': 'Water', 'syn_type': 'OTHER', 'synonyms': 'WATER'}, {'molecule_synonym': 'Water', 'syn_type': 'USP', 'synonyms': 'WATER'}, {'molecule_synonym': 'Water for injection', 'syn_type': 'BNF', 'synonyms': 'WATER FOR INJECTION'}, {'molecule_synonym': 'Water for injection', 'syn_type': 'OTHER', 'synonyms': 'WATER FOR INJECTION'}, {'molecule_synonym': 'Water purified', 'syn_type': 'OTHER', 'synonyms': 'WATER PURIFIED'}, {'molecule_synonym': 'Water, purified', 'syn_type': 'OTHER', 'synonyms': 'WATER, PURIFIED'}], 'molecule_type': 'Small molecule', 'natural_product': 0, 'oral': False, 'parenteral': False, 'polymer_flag': False, 'pref_name': 'WATER', 'prodrug': 0, 'structure_type': 'MOL', 'therapeutic_flag': False, 'topical': True, 'usan_stem': 'deu-', 'usan_stem_definition': 'deuterated compounds', 'usan_substem': 'deu-', 'usan_year': 1963, 'withdrawn_class': None, 'withdrawn_country': None, 'withdrawn_flag': False, 'withdrawn_reason': None, 'withdrawn_year': None}, '...(remaining elements truncated)...']"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "approved_drugs = molecule.filter(max_phase=4).order_by('molecule_properties__mw_freebase')\n",
    "approved_drugs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cd47d0c",
   "metadata": {},
   "source": [
    "## Get approved drugs for lung cancer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "99849173",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "631"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "drug_indication = new_client.drug_indication\n",
    "molecules = new_client.molecule\n",
    "\n",
    "lung_cancer_ind = drug_indication.filter(efo_term__icontains=\"LUNG CARCINOMA\")\n",
    "lung_cancer_mols = molecules.filter(\n",
    "    molecule_chembl_id__in=[x['molecule_chembl_id'] for x in lung_cancer_ind])\n",
    "\n",
    "len(lung_cancer_mols)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25bdd485",
   "metadata": {},
   "source": [
    "## Filter drugs by approval year and name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "6a801fde",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'applicants': ['Hikma Pharmaceuticals International Ltd', 'Jubilant Cadista Pharmaceuticals Inc', 'Apnar Pharma Lp', 'Abbott Laboratories Pharmaceutical Products Div', 'Ranbaxy Laboratories Ltd', 'Beximco Pharmaceuticals Usa Inc', 'Ivax Pharmaceuticals Inc Sub Teva Pharmaceuticals Usa', 'Mylan Technologies Inc', 'Teva Pharmaceuticals Usa Inc', 'Sandoz Inc'], 'atc_classification': [{'code': 'G04CA03', 'description': 'GENITO URINARY SYSTEM AND SEX HORMONES: UROLOGICALS: DRUGS USED IN BENIGN PROSTATIC HYPERTROPHY: Alpha-adrenoreceptor antagonists'}], 'availability_type': 1, 'biotherapeutic': None, 'black_box': False, 'black_box_warning': '0', 'chirality': 0, 'development_phase': 4, 'drug_type': 1, 'first_approval': 1987, 'first_in_class': False, 'helm_notation': None, 'indication_class': 'Antihypertensive', 'molecule_chembl_id': 'CHEMBL611', 'molecule_properties': {'alogp': '1.06', 'aromatic_rings': 2, 'cx_logd': '0.95', 'cx_logp': '1.18', 'cx_most_apka': None, 'cx_most_bpka': '7.24', 'full_molformula': 'C19H25N5O4', 'full_mwt': '387.44', 'hba': 8, 'hba_lipinski': 9, 'hbd': 1, 'hbd_lipinski': 2, 'heavy_atoms': 28, 'molecular_species': 'NEUTRAL', 'mw_freebase': '387.44', 'mw_monoisotopic': '387.1907', 'num_lipinski_ro5_violations': 0, 'num_ro5_violations': 0, 'psa': '103.04', 'qed_weighted': '0.83', 'ro3_pass': 'N', 'rtb': 4}, 'molecule_structures': {'canonical_smiles': 'COc1cc2nc(N3CCN(C(=O)C4CCCO4)CC3)nc(N)c2cc1OC', 'molfile': '\\n     RDKit          2D\\n\\n 28 31  0  0  0  0  0  0  0  0999 V2000\\n   -0.4208    0.3583    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.4208    1.0625    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.6458    1.0625    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.0333    1.4250    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.0333    0.0083    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.6458    0.3583    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.1917   -0.0125    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.0167   -1.1167    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4250   -0.7292    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.2708    1.4250    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.2708    0.0083    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.8833    1.0625    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.8833    0.3583    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.6750   -0.7667    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.1917   -0.7125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.8042    0.3458    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.8042   -1.0917    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4250   -0.0125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.0167   -1.8042    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.2625   -1.1917    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.0333    2.1333    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n   -3.4958    1.4125    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -3.4958   -0.0042    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.8292   -0.7667    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.9167   -0.0917    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -4.1083    1.0625    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -4.1083    0.3458    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.5792   -0.0917    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n  2  1  1  0\\n  3  6  2  0\\n  4  2  2  0\\n  5  1  2  0\\n  6  5  1  0\\n  7  1  1  0\\n  8  9  1  0\\n  9 18  1  0\\n 10  3  1  0\\n 11  6  1  0\\n 12 13  1  0\\n 13 11  2  0\\n 14  8  1  0\\n 15  7  1  0\\n 16  7  1  0\\n 17 15  1  0\\n 18 16  1  0\\n 19  8  2  0\\n 20 14  1  0\\n 21  4  1  0\\n 22 12  1  0\\n 23 13  1  0\\n 24 20  1  0\\n 25 14  1  0\\n 26 22  1  0\\n 27 23  1  0\\n 28 25  1  0\\n  4  3  1  0\\n 17  9  1  0\\n 12 10  2  0\\n 28 24  1  0\\nM  END', 'standard_inchi': 'InChI=1S/C19H25N5O4/c1-26-15-10-12-13(11-16(15)27-2)21-19(22-17(12)20)24-7-5-23(6-8-24)18(25)14-4-3-9-28-14/h10-11,14H,3-9H2,1-2H3,(H2,20,21,22)', 'standard_inchi_key': 'VCKUSRYTPJJLNI-UHFFFAOYSA-N'}, 'molecule_synonyms': [{'molecule_synonym': 'Hytrin', 'syn_type': 'OTHER', 'synonyms': 'Hytrin'}, {'molecule_synonym': 'Terazosin', 'syn_type': 'FDA', 'synonyms': 'Terazosin'}, {'molecule_synonym': 'Terazosin', 'syn_type': 'ATC', 'synonyms': 'TERAZOSIN'}, {'molecule_synonym': 'Terazosin', 'syn_type': 'BAN', 'synonyms': 'TERAZOSIN'}, {'molecule_synonym': 'Terazosin', 'syn_type': 'INN', 'synonyms': 'TERAZOSIN'}, {'molecule_synonym': 'Terazosin', 'syn_type': 'MERCK_INDEX', 'synonyms': 'TERAZOSIN'}, {'molecule_synonym': 'Terazosin', 'syn_type': 'OTHER', 'synonyms': 'TERAZOSIN'}], 'ob_patent': None, 'oral': True, 'parenteral': False, 'prodrug': False, 'research_codes': ['ABBOTT-45975 ANHYDROUS', 'ABBOTT-45975'], 'rule_of_five': True, 'sc_patent': None, 'synonyms': ['Terazosin (BAN, INN, MI)', 'Terazosin hydrochloride (FDA, JAN, MI, USAN, USP)', 'Terazosin hydrochloride dihydrate (MI)', 'Terazosin hydrochloride hydrate (JAN)', ''], 'topical': False, 'usan_stem': '-azosin', 'usan_stem_definition': 'antihypertensives (prazosin type)', 'usan_stem_substem': '-azosin(-azosin)', 'usan_year': 1980, 'withdrawn_class': None, 'withdrawn_country': None, 'withdrawn_flag': '0', 'withdrawn_reason': None, 'withdrawn_year': None}, {'applicants': ['Accord Healthcare Inc', 'Zydus Pharmaceuticals Usa Inc', 'Yaopharma Co Ltd', 'Dava Pharmaceuticals Inc', 'Ivax Pharmaceuticals Inc Sub Teva Pharmaceuticals Usa', 'Apotex Inc', 'Nesher Pharmaceuticals Usa Llc', 'Upjohn Us 1 Llc', 'Ani Pharmaceuticals Inc', 'Teva Pharmaceuticals Usa Inc'], 'atc_classification': [{'code': 'C02CA04', 'description': 'CARDIOVASCULAR SYSTEM: ANTIHYPERTENSIVES: ANTIADRENERGIC AGENTS, PERIPHERALLY ACTING: Alpha-adrenoreceptor antagonists'}], 'availability_type': 1, 'biotherapeutic': None, 'black_box': False, 'black_box_warning': '0', 'chirality': 0, 'development_phase': 4, 'drug_type': 1, 'first_approval': 1990, 'first_in_class': False, 'helm_notation': None, 'indication_class': 'Antihypertensive', 'molecule_chembl_id': 'CHEMBL707', 'molecule_properties': {'alogp': '1.72', 'aromatic_rings': 3, 'cx_logd': '1.91', 'cx_logp': '2.14', 'cx_most_apka': '12.67', 'cx_most_bpka': '7.24', 'full_molformula': 'C23H25N5O5', 'full_mwt': '451.48', 'hba': 9, 'hba_lipinski': 10, 'hbd': 1, 'hbd_lipinski': 2, 'heavy_atoms': 33, 'molecular_species': 'NEUTRAL', 'mw_freebase': '451.48', 'mw_monoisotopic': '451.1856', 'num_lipinski_ro5_violations': 0, 'num_ro5_violations': 0, 'psa': '112.27', 'qed_weighted': '0.63', 'ro3_pass': 'N', 'rtb': 4}, 'molecule_structures': {'canonical_smiles': 'COc1cc2nc(N3CCN(C(=O)C4COc5ccccc5O4)CC3)nc(N)c2cc1OC', 'molfile': '\\n     RDKit          2D\\n\\n 33 37  0  0  0  0  0  0  0  0999 V2000\\n    0.0167  -14.8292    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.0167  -15.5500    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.2208  -15.5500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.6083  -15.9125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.6083  -14.4667    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.0792  -13.7542    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.2208  -14.8292    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.4792  -13.4042    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.7042  -13.4167    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.6417  -14.4667    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.8667  -13.7542    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.8333  -15.9125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.8333  -14.4667    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.6792  -14.8167    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n    4.3042  -13.7667    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.4583  -15.5500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.4583  -14.8292    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.0667  -14.4667    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    4.2917  -14.4667    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.6417  -13.7667    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.2542  -14.8167    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.2417  -13.4167    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.8667  -14.4667    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.4792  -12.7042    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.6083  -16.6125    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n   -3.0833  -15.9125    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -3.0833  -14.4667    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n    4.9417  -13.4250    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    4.9167  -14.8292    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -3.6875  -14.8125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -3.6833  -15.5500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    5.5542  -13.7875    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    5.5417  -14.4875    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n  2  1  1  0\\n  3  7  2  0\\n  4  2  2  0\\n  5  1  2  0\\n  6  8  1  0\\n  7  5  1  0\\n  8 11  1  0\\n  9  6  1  0\\n 10  1  1  0\\n 11 22  1  0\\n 12  3  1  0\\n 13  7  1  0\\n 14 18  1  0\\n 15  9  1  0\\n 16 12  2  0\\n 17 13  2  0\\n 18  6  1  0\\n 19 14  1  0\\n 20 10  1  0\\n 21 10  1  0\\n 22 20  1  0\\n 23 21  1  0\\n 24  8  2  0\\n 25  4  1  0\\n 26 16  1  0\\n 27 17  1  0\\n 28 15  1  0\\n 29 19  1  0\\n 30 27  1  0\\n 31 26  1  0\\n 32 28  2  0\\n 33 29  2  0\\n  4  3  1  0\\n 23 11  1  0\\n 17 16  1  0\\n 15 19  2  0\\n 32 33  1  0\\nM  END', 'standard_inchi': 'InChI=1S/C23H25N5O5/c1-30-18-11-14-15(12-19(18)31-2)25-23(26-21(14)24)28-9-7-27(8-10-28)22(29)20-13-32-16-5-3-4-6-17(16)33-20/h3-6,11-12,20H,7-10,13H2,1-2H3,(H2,24,25,26)', 'standard_inchi_key': 'RUZYUOTYCVRMRZ-UHFFFAOYSA-N'}, 'molecule_synonyms': [{'molecule_synonym': 'C02CA04', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'C02CA04'}, {'molecule_synonym': 'Cardura', 'syn_type': 'OTHER', 'synonyms': 'Cardura'}, {'molecule_synonym': 'Doxazosin', 'syn_type': 'FDA', 'synonyms': 'Doxazosin'}, {'molecule_synonym': 'Doxazosin', 'syn_type': 'ATC', 'synonyms': 'DOXAZOSIN'}, {'molecule_synonym': 'Doxazosin', 'syn_type': 'BAN', 'synonyms': 'DOXAZOSIN'}, {'molecule_synonym': 'Doxazosin', 'syn_type': 'INN', 'synonyms': 'DOXAZOSIN'}, {'molecule_synonym': 'Doxazosin', 'syn_type': 'MERCK_INDEX', 'synonyms': 'DOXAZOSIN'}, {'molecule_synonym': 'Doxazosin', 'syn_type': 'OTHER', 'synonyms': 'DOXAZOSIN'}, {'molecule_synonym': 'UK-33274', 'syn_type': 'RESEARCH_CODE', 'synonyms': 'UK-33274'}], 'ob_patent': None, 'oral': True, 'parenteral': False, 'prodrug': False, 'research_codes': ['UK-33274', 'UK-33,274-27', 'UK-33274-27', 'C02CA04'], 'rule_of_five': True, 'sc_patent': None, 'synonyms': ['Doxazosin mesilate (JAN)', 'Doxazosin mesylate (FDA, USAN, USP)', 'Doxazosin methanesulfonate (MI)', 'Doxazosin (BAN, INN, MI)', ''], 'topical': False, 'usan_stem': '-azosin', 'usan_stem_definition': 'antihypertensives (prazosin type)', 'usan_stem_substem': '-azosin(-azosin)', 'usan_year': 1981, 'withdrawn_class': None, 'withdrawn_country': None, 'withdrawn_flag': '0', 'withdrawn_reason': None, 'withdrawn_year': None}]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "drug = new_client.drug\n",
    "res = drug.filter(first_approval__gte=1980).filter(usan_stem=\"-azosin\")\n",
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13dbd935",
   "metadata": {},
   "source": [
    "## Get all biotherapeutic molecules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "131f3398",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "22963"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "biotherapeutics = molecule.filter(biotherapeutic__isnull=False)\n",
    "len(biotherapeutics)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92603b03",
   "metadata": {},
   "source": [
    "## Get molecules with molecular weight <= 300"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "e1152b6b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "367682"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "light_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300)\n",
    "\n",
    "len(light_molecules)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10cf799b",
   "metadata": {},
   "source": [
    "## Get molecules with molecular weight <= 300 AND pref_name ending with nib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "e0692b5f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'molecule_chembl_id': 'CHEMBL276711', 'pref_name': 'SEMAXANIB'}, {'molecule_chembl_id': 'CHEMBL4594348', 'pref_name': 'ELSUBRUTINIB'}]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "light_nib_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300, pref_name__iendswith=\"nib\").only(['molecule_chembl_id', 'pref_name'])\n",
    "\n",
    "light_nib_molecules"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a644bb1",
   "metadata": {},
   "source": [
    "## Get all molecules in ChEMBL with no Rule-of-Five violations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "0a7228e1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1441706"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "molecule = new_client.molecule\n",
    "no_violations = molecule.filter(molecule_properties__num_ro5_violations=0)\n",
    "len(no_violations)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01a61542",
   "metadata": {},
   "source": [
    "# Activities"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2712a48",
   "metadata": {},
   "source": [
    "## Get all IC50 activities related to the hERG target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "f9b48d8e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "13200"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "target = new_client.target\n",
    "activity = new_client.activity\n",
    "herg = target.filter(pref_name__iexact='hERG').only('target_chembl_id')[0]\n",
    "herg_activities = activity.filter(target_chembl_id=herg['target_chembl_id']).filter(standard_type=\"IC50\")\n",
    "\n",
    "len(herg_activities)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d89df4e",
   "metadata": {},
   "source": [
    "## Get all activities for a specific target with assay type B (binding):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "aabe97fc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "860"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "activity = new_client.activity\n",
    "res = activity.filter(target_chembl_id='CHEMBL3938', assay_type='B')\n",
    "\n",
    "len(res)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43165897",
   "metadata": {},
   "source": [
    "## Get all activities with a pChEMBL value for a molecule"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "6f0787f7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "138"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "activities = new_client.activity\n",
    "res = activities.filter(molecule_chembl_id=\"CHEMBL25\", pchembl_value__isnull=False)\n",
    "\n",
    "len(res)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa79714b",
   "metadata": {},
   "source": [
    "## Search for ADMET-related inhibitor assays (type A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "5568d498",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'assay_category': None, 'assay_cell_type': None, 'assay_chembl_id': 'CHEMBL884521', 'assay_classifications': [], 'assay_organism': 'Rattus norvegicus', 'assay_parameters': [], 'assay_strain': None, 'assay_subcellular_fraction': None, 'assay_tax_id': 10116, 'assay_test_type': None, 'assay_tissue': None, 'assay_type': 'A', 'assay_type_description': 'ADME', 'bao_format': 'BAO_0000357', 'bao_label': 'single protein format', 'cell_chembl_id': None, 'confidence_description': 'Direct single protein target assigned', 'confidence_score': 9, 'description': 'Inhibition of cytochrome P450 progesterone 15-alpha hydroxylase', 'document_chembl_id': 'CHEMBL1125500', 'relationship_description': 'Direct protein target assigned', 'relationship_type': 'D', 'src_assay_id': None, 'src_id': 1, 'target_chembl_id': 'CHEMBL3705', 'tissue_chembl_id': None, 'variant_sequence': None}, {'assay_category': None, 'assay_cell_type': None, 'assay_chembl_id': 'CHEMBL615148', 'assay_classifications': [], 'assay_organism': 'Rattus norvegicus', 'assay_parameters': [], 'assay_strain': None, 'assay_subcellular_fraction': None, 'assay_tax_id': 10116, 'assay_test_type': None, 'assay_tissue': None, 'assay_type': 'A', 'assay_type_description': 'ADME', 'bao_format': 'BAO_0000019', 'bao_label': 'assay format', 'cell_chembl_id': None, 'confidence_description': 'Default value - Target unknown or has yet to be assigned', 'confidence_score': 0, 'description': 'Inhibition of cytochrome P450 progesterone 16-alpha hydroxylase', 'document_chembl_id': 'CHEMBL1125500', 'relationship_description': 'Default value - Target has yet to be curated', 'relationship_type': 'U', 'src_assay_id': None, 'src_id': 1, 'target_chembl_id': 'CHEMBL612558', 'tissue_chembl_id': None, 'variant_sequence': None}, {'assay_category': None, 'assay_cell_type': None, 'assay_chembl_id': 'CHEMBL615199', 'assay_classifications': [], 'assay_organism': 'Rattus norvegicus', 'assay_parameters': [], 'assay_strain': None, 'assay_subcellular_fraction': None, 'assay_tax_id': 10116, 'assay_test_type': None, 'assay_tissue': None, 'assay_type': 'A', 'assay_type_description': 'ADME', 'bao_format': 'BAO_0000357', 'bao_label': 'single protein format', 'cell_chembl_id': None, 'confidence_description': 'Direct single protein target assigned', 'confidence_score': 9, 'description': 'Inhibition of cytochrome P450 progesterone 2-alpha-hydroxylase', 'document_chembl_id': 'CHEMBL1125500', 'relationship_description': 'Direct protein target assigned', 'relationship_type': 'D', 'src_assay_id': None, 'src_id': 1, 'target_chembl_id': 'CHEMBL4971', 'tissue_chembl_id': None, 'variant_sequence': None}, {'assay_category': None, 'assay_cell_type': None, 'assay_chembl_id': 'CHEMBL883800', 'assay_classifications': [], 'assay_organism': 'Rattus norvegicus', 'assay_parameters': [], 'assay_strain': None, 'assay_subcellular_fraction': 'Microsome', 'assay_tax_id': 10116, 'assay_test_type': None, 'assay_tissue': None, 'assay_type': 'A', 'assay_type_description': 'ADME', 'bao_format': 'BAO_0000251', 'bao_label': 'microsome format', 'cell_chembl_id': None, 'confidence_description': 'Homologous single protein target assigned', 'confidence_score': 8, 'description': 'Inhibition of progesterone 6-beta-hydroxylase in rat hepatic microsomes', 'document_chembl_id': 'CHEMBL1125500', 'relationship_description': 'Homologous protein target assigned', 'relationship_type': 'H', 'src_assay_id': None, 'src_id': 1, 'target_chembl_id': 'CHEMBL340', 'tissue_chembl_id': None, 'variant_sequence': None}, '...(remaining elements truncated)...']"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "assay = new_client.assay\n",
    "res = assay.filter(description__icontains='inhibit', assay_type='A')\n",
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85cb9566",
   "metadata": {},
   "source": [
    "# Tissues"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "662b93c4",
   "metadata": {},
   "source": [
    "## Get tissue by BTO ID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "dd5d9b29",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'bto_id': 'BTO:0001073', 'caloha_id': 'TS-0798', 'efo_id': 'EFO:0000857', 'pref_name': 'Pituitary gland', 'tissue_chembl_id': 'CHEMBL3638173', 'uberon_id': 'UBERON:0000007'}]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "tissue = new_client.tissue\n",
    "res = tissue.filter(bto_id=\"BTO:0001073\")\n",
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57fe48cf",
   "metadata": {},
   "source": [
    "## Get tissue by Caloha id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "464e2169",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'bto_id': 'BTO:0000648', 'caloha_id': 'TS-0490', 'efo_id': 'EFO:0000834', 'pref_name': 'Intestine', 'tissue_chembl_id': 'CHEMBL3638176', 'uberon_id': 'UBERON:0000160'}]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "tissue = new_client.tissue\n",
    "res = tissue.filter(caloha_id=\"TS-0490\")\n",
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44e8f5e7",
   "metadata": {},
   "source": [
    "## Get tissue by Uberon id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "541b664a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'bto_id': 'BTO:0000068', 'caloha_id': 'TS-0034', 'efo_id': None, 'pref_name': 'Amniotic fluid', 'tissue_chembl_id': 'CHEMBL3638177', 'uberon_id': 'UBERON:0000173'}]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "tissue = new_client.tissue\n",
    "res = tissue.filter(uberon_id=\"UBERON:0000173\")\n",
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e45330b",
   "metadata": {},
   "source": [
    "## Get tissue by name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "b91d01fb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'bto_id': None, 'caloha_id': None, 'efo_id': None, 'pref_name': 'Blood brain barrier', 'tissue_chembl_id': 'CHEMBL3987461', 'uberon_id': 'UBERON:0000120'}, {'bto_id': 'BTO:0000089', 'caloha_id': 'TS-0079', 'efo_id': 'EFO:0000296', 'pref_name': 'Blood', 'tissue_chembl_id': 'CHEMBL3638178', 'uberon_id': 'UBERON:0000178'}, {'bto_id': 'BTO:0001102', 'caloha_id': 'TS-0080', 'efo_id': 'EFO:0000817', 'pref_name': 'Blood vessel', 'tissue_chembl_id': 'CHEMBL3987656', 'uberon_id': 'UBERON:0001981'}, {'bto_id': 'BTO:0000102', 'caloha_id': None, 'efo_id': None, 'pref_name': 'Blood clot', 'tissue_chembl_id': 'CHEMBL3987655', 'uberon_id': 'UBERON:0010210'}, '...(remaining elements truncated)...']"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "tissue = new_client.tissue\n",
    "res = tissue.filter(pref_name__istartswith='blood')\n",
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0fc7abaa",
   "metadata": {},
   "source": [
    "# Cells"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "759f9ae0",
   "metadata": {},
   "source": [
    "## Get cell line by cellosaurus id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "f0af2638",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'cell_chembl_id': 'CHEMBL3307686', 'cell_description': 'MDA-MB-435 (Breast metastasis of melanoma cells', 'cell_id': 687, 'cell_name': 'MDA-MB-435', 'cell_source_organism': 'Homo sapiens', 'cell_source_tax_id': 9606, 'cell_source_tissue': 'Breast metastasis of melanoma cells', 'cellosaurus_id': 'CVCL_0417', 'cl_lincs_id': None, 'clo_id': None, 'efo_id': 'EFO_0001213'}]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "cell_line = new_client.cell_line\n",
    "res = cell_line.filter(cellosaurus_id=\"CVCL_0417\")\n",
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f2e9e2a",
   "metadata": {},
   "source": [
    "# Targets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0317d6a2",
   "metadata": {},
   "source": [
    "## Find a target by gene name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "7f160cfd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'organism': 'Homo sapiens', 'pref_name': 'Bromodomain-containing protein 4', 'target_type': 'SINGLE PROTEIN'}\n",
      "{'organism': 'Mus musculus', 'pref_name': 'Bromodomain-containing protein 4', 'target_type': 'SINGLE PROTEIN'}\n",
      "{'organism': 'Homo sapiens', 'pref_name': 'BRD4/HDAC1', 'target_type': 'PROTEIN COMPLEX'}\n",
      "{'organism': 'Homo sapiens', 'pref_name': 'Cereblon/Cullin-4A/Bromodomain-containing protein 4', 'target_type': 'PROTEIN-PROTEIN INTERACTION'}\n",
      "{'organism': 'Homo sapiens', 'pref_name': 'Cereblon/Bromodomain-containing protein 4', 'target_type': 'PROTEIN-PROTEIN INTERACTION'}\n",
      "{'organism': 'Homo sapiens', 'pref_name': 'von Hippel-Lindau disease tumor suppressor/Bromodomain-containing protein 4', 'target_type': 'PROTEIN-PROTEIN INTERACTION'}\n",
      "{'organism': 'Homo sapiens', 'pref_name': 'Cereblon/DNA damage-binding protein 1/Bromodomain-containing protein 4', 'target_type': 'PROTEIN-PROTEIN INTERACTION'}\n",
      "{'organism': 'Homo sapiens', 'pref_name': 'von Hippel-Lindau disease tumor suppressor/Elongin-B/Elongin-C/Bromodomain-containing protein 4', 'target_type': 'PROTEIN-PROTEIN INTERACTION'}\n",
      "{'organism': 'Homo sapiens', 'pref_name': 'Bromodomain and extra-terminal motif (BET)', 'target_type': 'PROTEIN FAMILY'}\n",
      "{'organism': 'Homo sapiens', 'pref_name': 'BRD4/E3 ubiquitin-protein ligase Mdm2', 'target_type': 'PROTEIN-PROTEIN INTERACTION'}\n",
      "{'organism': 'Homo sapiens', 'pref_name': 'BRD4/E3 ubiquitin-protein ligase XIAP', 'target_type': 'PROTEIN-PROTEIN INTERACTION'}\n"
     ]
    }
   ],
   "source": [
    "from chembl_webresource_client.new_client import new_client\n",
    "\n",
    "target = new_client.target\n",
    "gene_name = 'BRD4'\n",
    "res = target.filter(target_synonym__icontains=gene_name).only(['organism', 'pref_name', 'target_type'])\n",
    "for i in res:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c38e41d0",
   "metadata": {},
   "source": [
    "# Utils"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "390ab150",
   "metadata": {},
   "source": [
    "## Convert SMILES to CTAB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "c815e59b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\n     RDKit          2D\\n\\n 13 13  0  0  0  0  0  0  0  0999 V2000\\n   -0.9550   -1.3220    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.9528   -0.3220    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.0858    0.1764    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.7792   -0.3254    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.7772   -1.3254    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.6422   -1.8270    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.5092   -1.3288    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.5112   -0.3288    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.6462    0.1728    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.6482    1.1728    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.7832    1.6746    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.5152    1.6710    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.8178    0.1798    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n  1  2  2  0\\n  2  3  1  0\\n  3  4  1  0\\n  4  5  2  0\\n  5  6  1  0\\n  6  7  2  0\\n  7  8  1  0\\n  8  9  2  0\\n  9 10  1  0\\n 10 11  2  0\\n 10 12  1  0\\n  2 13  1  0\\n  9  4  1  0\\nM  END\\n$$$$\\n'"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.utils import utils\n",
    "\n",
    "aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')\n",
    "aspirin"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18bb3f6c",
   "metadata": {},
   "source": [
    "## Compute Maximal Common Substructure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "5dcf922d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'[#6]1(-[#6]):[#6]:[#6](-[#8]-[#6]):[#6](:[#6]:[#6]:1)-[#8]'"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.utils import utils\n",
    "\n",
    "smiles = [\"O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C\",\n",
    "          \"CC(C)CCCCCC(=O)NCC1=CC(=C(C=C1)O)OC\", \"c1(C=O)cc(OC)c(O)cc1\"]\n",
    "mols = [utils.smiles2ctab(smile) for smile in smiles]\n",
    "sdf = ''.join(mols)\n",
    "result = utils.mcs(sdf)\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e32cf0b5",
   "metadata": {},
   "source": [
    "## Compute various molecular descriptors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "9fdae9fc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'qed': 0.5501217966938848,\n",
       " 'MolWt': 180.15899999999996,\n",
       " 'TPSA': 63.60000000000001,\n",
       " 'HeavyAtomCount': 13,\n",
       " 'NumAromaticRings': 1,\n",
       " 'NumHAcceptors': 3,\n",
       " 'NumHDonors': 1,\n",
       " 'NumRotatableBonds': 2,\n",
       " 'MolLogP': 1.3100999999999998,\n",
       " 'MolecularFormula': 'C9H8O4',\n",
       " 'Ro3Pass': 0,\n",
       " 'NumRo5': 0,\n",
       " 'MonoisotopicMolWt': 180.042258736}"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.utils import utils\n",
    "import json\n",
    "\n",
    "aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')\n",
    "descs = json.loads(utils.chemblDescriptors(aspirin))[0]\n",
    "descs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c994bc17",
   "metadata": {},
   "source": [
    "## Compute structural alerts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "9777bd29",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'alert_id': 1030, 'alert_name': 'Ester', 'set_name': 'MLSMR', 'smarts': '[#6]-C(=O)O-[#6]'}\n",
      "{'alert_id': 1069, 'alert_name': 'vinyl michael acceptor1', 'set_name': 'MLSMR', 'smarts': '[#6]-[CH1]=C-C(=O)[#6,#7,#8]'}\n"
     ]
    }
   ],
   "source": [
    "from chembl_webresource_client.utils import utils\n",
    "\n",
    "mol = utils.smiles2ctab(\"O=C(Oc1ccccc1C(=O)O)C\")\n",
    "alerts = json.loads(utils.structuralAlerts(mol))\n",
    "for a in alerts[0]:\n",
    "    print(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3101c10e",
   "metadata": {},
   "source": [
    "## Standardize a molecule"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "e599de0f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'standard_molblock': '\\n     RDKit          2D\\n\\n 19 17  0  0  0  0  0  0  0  0999 V2000\\n    0.0000   -3.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -5.5170   -1.9538    0.0000 Na  0  0  0  0  0 15  0  0  0  0  0  0\\n   -2.9244   -0.4442    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.0602    0.0590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.0638    1.0590    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.1924   -0.4380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.3282    0.0652    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.5396   -0.4318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4038    0.0714    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4002    1.0714    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.2644    1.5744    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.2608    2.5744    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.5324    1.5682    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.3318    1.0652    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    5.0649    2.8712    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    4.8623    1.8920    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.8683    1.7822    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.4567    2.6936    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    4.1963    3.3666    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n  3  4  1  0\\n  4  5  2  0\\n  4  6  1  0\\n  6  7  1  0\\n  7  8  2  0\\n  8  9  1  0\\n  9 10  2  0\\n 10 11  1  0\\n 11 12  1  0\\n 10 13  1  0\\n 13 14  2  0\\n 14  7  1  0\\n 15 16  2  0\\n 16 17  1  0\\n 17 18  2  0\\n 18 19  1  0\\n 19 15  1  0\\nM  CHG  2   2   1   3  -1\\nM  END\\n'}]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.utils import utils\n",
    "mol = utils.smiles2ctab(\"[Na]OC(=O)Cc1ccc(C[NH3+])cc1.c1nnn[n-]1.O\")\n",
    "st = json.loads(utils.standardize(mol))\n",
    "st"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c40678d",
   "metadata": {},
   "source": [
    "## Calculate the parent molecule"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "117e3b57",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'parent_molblock': '\\n     RDKit          2D\\n\\n 18 18  0  0  0  0  0  0  0  0999 V2000\\n   -5.5170   -1.9538    0.0000 Na  0  0  0  0  0  1  0  0  0  0  0  0\\n   -2.9244   -0.4442    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.0602    0.0590    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -2.0638    1.0590    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0\\n   -1.1924   -0.4380    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.3282    0.0652    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.5396   -0.4318    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4038    0.0714    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    1.4002    1.0714    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.2644    1.5744    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    2.2608    2.5744    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    0.5324    1.5682    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n   -0.3318    1.0652    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    5.0649    2.8712    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\\n    4.8623    1.8920    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.8683    1.7822    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    3.4567    2.6936    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n    4.1963    3.3666    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0\\n  1  2  1  0\\n  2  3  1  0\\n  3  4  2  0\\n  3  5  1  0\\n  5  6  1  0\\n  6  7  2  0\\n  7  8  1  0\\n  8  9  2  0\\n  9 10  1  0\\n 10 11  1  0\\n  9 12  1  0\\n 12 13  2  0\\n 13  6  1  0\\n 14 15  2  0\\n 15 16  1  0\\n 16 17  2  0\\n 17 18  1  0\\n 18 14  1  0\\nM  END\\n',\n",
       "  'exclude': False}]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chembl_webresource_client.utils import utils\n",
    "\n",
    "mol = utils.smiles2ctab(\"[Na]OC(=O)Cc1ccc(C[NH3+])cc1.c1nnn[n-]1.[Na]\")\n",
    "par = json.loads(utils.getParent(mol))\n",
    "par"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ebdc5d74",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
